Towards Semantic Dataset Profiling

نویسندگان

  • Mohamed Ben Ellefi
  • Zohra Bellahsene
  • François Scharffe
  • Konstantin Todorov
چکیده

The web of data is growing constantly, both in terms of size and impact. A potential data publisher needs to dispose with recapitulative information on the datasets available on the web, so that she can easily identify where to look for the resources to which her data relates. This information will help discover candidate datasets for interlinking. In that context, we investigate the problem of dataset profiling. We define a dataset profile as a set of characteristics, both semantic and statistical, that allow to describe in the best possible way a dataset by taking into account the multiplicity of domains and vocabularies on the web of data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards constructing an Integrative, Multi-Level Model for Cognition: The Function of Semantic Networks

Integrated approaches try to connect different constructs in different theories and reinterpret them using a common conceptual framework. In this research, using the concept of processing levels, an integrated, three-level model of the cognitive systems has been proposed and evaluated. Processing levels are divided into three categories of Feature-Oriented, Semantic and Conceptual Level based o...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

A Persian-English Cross-Linguistic Dataset for Research on the Visual Processing of Cognates and Noncognates

Finding out which lexico-semantic features of cognates are critical in cross-language studies and comparing these features with noncognates helps researchers to decide which features to control in studies with cognates. Normative databases provide necessary information for this purpose. Such resources are lacking in the Persian language. We created a dataset and determined norms for the essenti...

متن کامل

How Would You Say It? Eliciting Lexically Diverse Data for Supervised Semantic Parsing

Building dialogue interfaces for realworld scenarios often entails training semantic parsers starting from zero examples. How can we build datasets that better capture the variety of ways users might phrase their queries, and what queries are actually realistic? Wang et al. (2015) proposed a method to build semantic parsing datasets by generating canonical utterances using a grammar and having ...

متن کامل

Towards a Linked Open Dataset for Scholarly Publishing: Semantic Lancet Project

There is an ever increasing interest in publishing Linked Open Datasets about scientific papers. The current landscape is very fragmented: some projects focus on bibliographic data, others on authorship data, others on citations, and so on. The quality is also heterogeneous and the production and maintenance of such datasets is difficult and time-consuming. In this paper we introduce the Semant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014